Search Results: "uwe"

10 May 2024

Reproducible Builds: Reproducible Builds in April 2024

Welcome to the April 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. New backseat-signed tool to validate distributions source inputs
  2. NixOS is not reproducible
  3. Certificate vulnerabilities in F-Droid s fdroidserver
  4. Website updates
  5. Reproducible Builds and Insights from an Independent Verifier for Arch Linux
  6. libntlm now releasing minimal source-only tarballs
  7. Distribution work
  8. Mailing list news
  9. diffoscope
  10. Upstream patches
  11. reprotest
  12. Reproducibility testing framework

New backseat-signed tool to validate distributions source inputs kpcyrd announced a new tool called backseat-signed, after:
I figured out a somewhat straight-forward way to check if a given git archive output is cryptographically claimed to be the source input of a given binary package in either Arch Linux or Debian (or both).
Elaborating more in their announcement post, kpcyrd writes:
I believe this to be the reproducible source tarball thing some people have been asking about. As explained in the README, I believe reproducing autotools-generated tarballs isn t worth everybody s time and instead a distribution that claims to build from source should operate on VCS snapshots instead of tarballs with 25k lines of pre-generated shell-script.
Indeed, many distributions packages already build from VCS snapshots, and this trend is likely to accelerate in response to the xz incident. The announcement led to a lengthy discussion on our mailing list, as well as shorter followup thread from kpcyrd about bootstrapping Autotools projects.

NixOS is not reproducible Morten Linderud posted an post on his blog this month, provocatively titled, NixOS is not reproducible . Although quickly admitting that his title is indeed clickbait , Morten goes on to clarify the precise guarantees and promises that NixOS provides its users. Later in the most, Morten mentions that he was motivated to write the post because:
I have heavily invested my free-time on this topic since 2017, and met some of the accomplishments we have had with Doesn t NixOS solve this? for just as long and I thought it would be of peoples interest to clarify[.]

Certificate vulnerabilities in F-Droid s fdroidserver In early April, Fay Stegerman announced a certificate pinning bypass vulnerability and Proof of Concept (PoC) in the F-Droid fdroidserver tools for managing builds, indexes, updates, and deployments for F-Droid repositories to the oss-security mailing list.
We observed that embedding a v1 (JAR) signature file in an APK with minSdk >= 24 will be ignored by Android/apksigner, which only checks v2/v3 in that case. However, since fdroidserver checks v1 first, regardless of minSdk, and does not verify the signature, it will accept a fake certificate and see an incorrect certificate fingerprint. [ ] We also realised that the above mentioned discrepancy between apksigner and androguard (which fdroidserver uses to extract the v2/v3 certificates) can be abused here as well. [ ]
Later on in the month, Fay followed up with a second post detailing a third vulnerability and a script that could be used to scan for potentially affected .apk files and mentioned that, whilst upstream had acknowledged the vulnerability, they had not yet applied any ameliorating fixes.

Website updates There were a number of improvements made to our website this month, including Chris Lamb updating the archive page to recommend -X and unzipping with TZ=UTC [ ] and adding Maven, Gradle, JDK and Groovy examples to the SOURCE_DATE_EPOCH page [ ]. In addition Jan Zerebecki added a new /contribute/opensuse/ page [ ] and Sertonix fixed the automatic RSS feed detection [ ][ ].

Reproducible Builds and Insights from an Independent Verifier for Arch Linux Joshua Drexel, Esther H nggi and Iy n M ndez Veiga of the School of Computer Science and Information Technology, Hochschule Luzern (HSLU) in Switzerland published a paper this month entitled Reproducible Builds and Insights from an Independent Verifier for Arch Linux. The paper establishes the context as follows:
Supply chain attacks have emerged as a prominent cybersecurity threat in recent years. Reproducible and bootstrappable builds have the potential to reduce such attacks significantly. In combination with independent, exhaustive and periodic source code audits, these measures can effectively eradicate compromises in the building process. In this paper we introduce both concepts, we analyze the achievements over the last ten years and explain the remaining challenges.
What is more, the paper aims to:
contribute to the reproducible builds effort by setting up a rebuilder and verifier instance to test the reproducibility of Arch Linux packages. Using the results from this instance, we uncover an unnoticed and security-relevant packaging issue affecting 16 packages related to Certbot [ ].
A PDF of the paper is available.

libntlm now releasing minimal source-only tarballs Simon Josefsson wrote on his blog this month that, going forward, the libntlm project will now be releasing what they call minimal source-only tarballs :
The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. [The] risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions [ship] generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built[.]
Simon s post goes into further details how this was achieved, and describes some potential caveats and counters some expected responses as well. A shorter version can be found in the announcement for the 1.8 release of libntlm.

Distribution work In Debian this month, Helmut Grohne filed a bug suggesting the removal of dh-buildinfo, a tool to generate and distribute .buildinfo-like files within binary packages. Note that this is distinct from the .buildinfo generation performed by dpkg-genbuildinfo. By contrast, the entirely optional dh-buildinfo generated a debian/buildinfo file that would be shipped within binary packages as /usr/share/doc/package/buildinfo_$arch.gz. Adrian Bunk recently asked about including source hashes in Debian s .buildinfo files, which prompted Guillem Jover to refresh some old patches to dpkg to make this possible, which revealed some quirks Vagrant Cascadian discovered when testing. In addition, 21 reviews of Debian packages were added, 22 were updated and 16 were removed this month adding to our knowledge about identified issues. A number issue types have been added, such as new random_temporary_filenames_embedded_by_mesonpy and timestamps_added_by_librime toolchain issues. In openSUSE, it was announced that their Factory distribution enabled bit-by-bit reproducible builds for almost all parts of the package. Previously, more parts needed to be ignored when comparing package files, but now only the signature needs to be deleted. In addition, Bernhard M. Wiedemann published theunreproduciblepackage as a proper .rpm package which it allows to better test tools intended to debug reproducibility. Furthermore, it was announced that Bernhard s work on a 100% reproducible openSUSE-based distribution will be funded by NLnet. He also posted another monthly report for his reproducibility work in openSUSE. In GNU Guix, Janneke Nieuwenhuizen submitted a patch set for creating a reproducible source tarball for Guix. That is to say, ensuring that make dist is reproducible when run from Git. [ ] Lastly, in Fedora, a new wiki page was created to propose a change to the distribution. Titled Changes/ReproduciblePackageBuilds , the page summarises itself as a proposal whereby A post-build cleanup is integrated into the RPM build process so that common causes of build irreproducibility in packages are removed, making most of Fedora packages reproducible.

Mailing list news On our mailing list this month:
  • Continuing a thread started in March 2024 about the Arch Linux minimal container now being 100% reproducible, John Gilmore followed up with a post about the practical and philosophical distinctions of local vs. remote storage of the various artifacts needed to build packages.
  • Chris Lamb asked the list which conferences readers are attending these days: After peak Covid and other industry-wide changes, conferences are no longer the must attend events they previously were especially in the area of software supply-chain security. In rough, practical terms, it seems harder to justify conference travel today than it did in mid-2019. The thread generated a number of responses which would be of interest to anyone planning travel in Q3 and Q4 of 2024.
  • James Addison wrote to the list about a quirk in Git related to its core.autocrlf functionality, thus helpfully passing on a slightly off-topic and perhaps not of direct relevance to anyone on the list today note that might still be the kind of issue that is useful to be aware of if-and-when puzzling over unexpected git content / checksum issues (situations that I do expect people on this list encounter from time-to-time) .

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 263, 264 and 265 to Debian and made the following additional changes:
  • Don t crash on invalid .zip files, even if we encounter their badness halfway through the file and not at the time of their initial opening. [ ]
  • Prevent odt2txt tests from always being skipped due to an (impossibly) new version requirement. [ ]
  • Avoid parens-in-parens in test skipping messages. [ ]
  • Ensure that tests with >=-style version constraints actually print the tool name. [ ]
In addition, Fay Stegerman fixed a crash when there are (invalid) duplicate entries in .zip which was originally reported in Debian bug #1068705). [ ] Fay also added a user-visible note to a diff when there are duplicate entries in ZIP files [ ]. Lastly, Vagrant Cascadian added an external tool pointer for the zipdetails tool under GNU Guix [ ] and proposed updates to diffoscope in Guix as well [ ] which were merged as [264] [265], fixed a regression in test coverage and increased verbosity of the test suite[ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

reprotest reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, reprotest version 0.7.27 was uploaded to Debian unstable) by Vagrant Cascadian who made the following additional changes:
  • Enable specific number of CPUs using --vary=num_cpus.cpus=X. [ ]
  • Consistently use 398 days for time variation, rather than choosing randomly each time. [ ]
  • Disable builds of arch:any packages. [ ]
  • Update the description for the build_path.path option in README.rst. [ ]
  • Update escape sequences for compatibility with Python 3.12. (#1068853). [ ]
  • Remove the generic upstream signing-key [ ] and update the packages signing key with the currently active team members [ ].
  • Update the packaging Standards-Version to 4.7.0. [ ]
In addition, Holger Levsen fixed some spelling errors detected by the spellintian tool [ ] and Vagrant Cascadian updated reprotest in GNU Guix to 0.7.27.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In April, an enormous number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Adjust for changed internal IP addresses at Codethink. [ ]
    • Automatically cleanup failed diffoscope user services if there are too many failures. [ ][ ]
    • Configure two new nodes at infomanik.cloud. [ ][ ]
    • Schedule Debian experimental even less. [ ][ ]
  • Breakage detection:
    • Exclude currently building packages from breakage detection. [ ]
    • Be more noisy if diffoscope crashes. [ ]
    • Health check: provide clickable URLs in jenkins job log for failed pkg builds due to diffoscope crashes. [ ]
    • Limit graph to about the last 100 days of breakages only. [ ]
    • Fix all found files with bad permissions. [ ]
    • Prepare dealing with diffoscope timeouts. [ ]
    • Detect more cases of failure to debootstrap base system. [ ]
    • Include timestamps of failed job runs. [ ]
  • Documentation updates:
    • Document how to access arm64 nodes at Codethink. [ ]
    • Document how to use infomaniak.cloud. [ ]
    • Drop notes about long stalled LeMaker HiKey960 boards sponsored by HPE and hosted at ETH. [ ]
    • Mention osuosl4 and osuosl5 and explain their usage. [ ]
    • Mention that some packages are built differently. [ ][ ]
    • Improve language in a comment. [ ]
    • Add more notes how to query resource usage from infomaniak.cloud. [ ]
  • Node maintenance:
    • Add ionos4 and ionos14 to THANKS. [ ][ ][ ][ ][ ]
    • Deprecate Squid on ionos1 and ionos10. [ ]
    • Drop obsolete script to powercycle arm64 architecture nodes. [ ]
    • Update system_health_check for new proxy nodes. [ ]
  • Misc changes:
    • Make the update_jdn.sh script more robust. [ ][ ]
    • Update my SSH public key. [ ]
In addition, Mattia Rizzolo added some new host details. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

13 April 2024

Simon Josefsson: Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz
The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:
make srcdist
gpg -b libntlm-1.8-src.tar.gz
Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:
91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz
So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:
podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz
You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:
podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz
The source-only minimal tarball can be regenerated on Debian 11:
podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 
As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.
podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz
Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:
git archive --prefix=libntlm-v1.8/ -o libntlm-1.8-src.tar.gz HEAD
Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

11 April 2024

Reproducible Builds: Reproducible Builds in March 2024

Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Arch Linux minimal container userland now 100% reproducible
  2. Validating Debian s build infrastructure after the XZ backdoor
  3. Making Fedora Linux (more) reproducible
  4. Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management
  5. Software and source code identification with GNU Guix and reproducible builds
  6. Two new Rust-based tools for post-processing determinism
  7. Distribution work
  8. Mailing list highlights
  9. Website updates
  10. Delta chat clients now reproducible
  11. diffoscope updates
  12. Upstream patches
  13. Reproducibility testing framework

Arch Linux minimal container userland now 100% reproducible In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux minimal container userland is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a real world , widely-used Linux distribution being reproducible. Their post, which kpcyrd suffixed with the question now what? , continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.

Validating Debian s build infrastructure after the XZ backdoor From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability. Vagrant reports (with some caveats):
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.
That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.

Making Fedora Linux (more) reproducible In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible. Documented in more detail on Fedora s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what s coming next. (YouTube video)

Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF
Julien s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can be leveraged to further improve the safety of the software supply chain , etc.

Software and source code identification with GNU Guix and reproducible builds In a long line of commendably detailed blog posts, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: What does it take to identify software ? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it? Later in the month, Ludovic Court s wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic s post touches on GNU Guix s aim to support time travel , the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.

Two new Rust-based tools for post-processing determinism Zbigniew J drzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora In addition, Yossi Kreinin published a blog post titled refix: fast, debuggable, reproducible builds that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.

Distribution work In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of normal to a new level of wishlist. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories toolchain issue [ ] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource issue [ ] and updating the random_uuid_in_notebooks_generated_by_nbsphinx to reference a relevant discussion thread [ ]. In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t transition. Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.

Mailing list highlights Elsewhere on our mailing list this month:

Website updates There were made a number of improvements to our website this month, including:
  • Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the CITATION.cff file. Pol also added an substantial new section to the buy in page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [ ][ ]
  • Bernhard M. Wiedemann added a new commandments page to the documentation [ ][ ] and fixed some incorrect YAML elsewhere on the site [ ].
  • Chris Lamb add three recent academic papers to the publications page of the website. [ ]
  • Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of amd64 virtual machines. [ ][ ][ ]
  • Roland Clobus updated the stable outputs page, dropping version numbers from Python documentation pages [ ] and noting that Python s set data structure is also affected by the PYTHONHASHSEED functionality. [ ]

Delta chat clients now reproducible Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 259, 260 and 261 to Debian and made the following additional changes:
  • New features:
    • Add support for the zipdetails tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. [ ]
  • Bug fixes:
    • Don t identify Redis database dumps as GNU R database files based simply on their filename. [ ]
    • Add a missing call to File.recognizes so we actually perform the filename check for GNU R data files. [ ]
    • Don t crash if we encounter an .rdb file without an equivalent .rdx file. (#1066991)
    • Correctly check for 7z being available and not lz4 when testing 7z. [ ]
    • Prevent a traceback when comparing a contentful .pyc file with an empty one. [ ]
  • Testsuite improvements:
    • Fix .epub tests after supporting the new zipdetails tool. [ ]
    • Don t use parenthesis within test skipping messages, as PyTest adds its own parenthesis. [ ]
    • Factor out Python version checking in test_zip.py. [ ]
    • Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. [ ]
    • Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). [ ]
In addition, Fay Stegerman updated diffoscope s monkey patch for supporting the unusual Mozilla ZIP file format after Python s zipfile module changed to detect potentially insecure overlapping entries within .zip files. (#362) Chris Lamb also updated the trydiffoscope command line client, dropping a build-dependency on the deprecated python3-distutils package to fix Debian bug #1065988 [ ], taking a moment to also refresh the packaging to the latest Debian standards [ ]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. [ ]

Upstream patches This month, we wrote a large number of patches, including: Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their %check section, thus failing when built with the --no-checks option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy, exiv2, gnome-disk-utility, grisbi, gsl, itinerary, kosmindoormap, libQuotient, med-tools, plasma6-disks, pspp, python-pypuppetdb, python-urlextract, rsync, vagrant-libvirt and xsimd. Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it s safe to say that the firmware should keep working.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, an enormous number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Sleep less after a so-called 404 package state has occurred. [ ]
    • Schedule package builds more often. [ ][ ]
    • Regenerate all our HTML indexes every hour, but only every 12h for the released suites. [ ]
    • Create and update unstable and experimental base systems on armhf again. [ ][ ]
    • Don t reschedule so many depwait packages due to the current size of the i386 architecture queue. [ ]
    • Redefine our scheduling thresholds and amounts. [ ]
    • Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. [ ]
    • Only create the stats_buildinfo.png graph once per day. [ ][ ]
    • Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. [ ]
    • Document how to use systemctl with new systemd-based services. [ ]
    • Temporarily disable armhf and i386 continuous integration tests in order to get some stability back. [ ]
    • Use the deb.debian.org CDN everywhere. [ ]
    • Remove the rsyslog logging facility on bookworm systems. [ ]
    • Add zst to the list of packages which are false-positive diskspace issues. [ ]
    • Detect failures to bootstrap Debian base systems. [ ]
  • Arch Linux-related changes:
    • Temporarily disable builds because the pacman package manager is broken. [ ][ ]
    • Split reproducible_html_live_status and split the scheduling timing . [ ][ ][ ]
    • Improve handling when database is locked. [ ][ ]
  • Misc changes:
    • Show failed services that require manual cleanup. [ ][ ]
    • Integrate two new Infomaniak nodes. [ ][ ][ ][ ]
    • Improve IRC notifications for artifacts. [ ]
    • Run diffoscope in different systemd slices. [ ]
    • Run the node health check more often, as it can now repair some issues. [ ][ ]
    • Also include the string Bot in the userAgent for Git. (Re: #929013). [ ]
    • Document increased tmpfs size on our OUSL nodes. [ ]
    • Disable memory account for the reproducible_build service. [ ][ ]
    • Allow 10 times as many open files for the Jenkins service. [ ]
    • Set OOMPolicy=continue and OOMScoreAdjust=-1000 for both the Jenkins and the reproducible_build service. [ ]
Mattia Rizzolo also made the following changes:
  • Debian-related changes:
    • Define a systemd slice to group all relevant services. [ ][ ]
    • Add a bunch of quotes in scripts to assuage the shellcheck tool. [ ]
    • Add stats on how many packages have been built today so far. [ ]
    • Instruct systemd-run to handle diffoscope s exit codes specially. [ ]
    • Prefer the pgrep tool over grepping the output of ps. [ ]
    • Re-enable a couple of i386 and armhf architecture builders. [ ][ ]
    • Fix some stylistic issues flagged by the Python flake8 tool. [ ]
    • Cease scheduling Debian unstable and experimental on the armhf architecture due to the time_t transition. [ ]
    • Start a few more i386 & armhf workers. [ ][ ][ ]
    • Temporarly skip pbuilder updates in the unstable distribution, but only on the armhf architecture. [ ]
  • Other changes:
    • Perform some large-scale refactoring on how the systemd service operates. [ ][ ]
    • Move the list of workers into a separate file so it s accessible to a number of scripts. [ ]
    • Refactor the powercycle_x86_nodes.py script to use the new IONOS API and its new Python bindings. [ ]
    • Also fix nph-logwatch after the worker changes. [ ]
    • Do not install the stunnel tool anymore, it shouldn t be needed by anything anymore. [ ]
    • Move temporary directories related to Arch Linux into a single directory for clarity. [ ]
    • Update the arm64 architecture host keys. [ ]
    • Use a common Postfix configuration. [ ]
The following changes were also made by:
  • Jan-Benedict Glaw:
    • Initial work to clean up a messy NetBSD-related script. [ ][ ]
  • Roland Clobus:
    • Show the installer log if the installer fails to build. [ ]
    • Avoid the minus character (i.e. -) in a variable in order to allow for tags in openQA. [ ]
    • Update the schedule of Debian live image builds. [ ]
  • Vagrant Cascadian:
    • Maintenance on the virt* nodes is completed so bring them back online. [ ]
    • Use the fully qualified domain name in configuration. [ ]
Node maintenance was also performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

14 January 2024

Uwe Kleine-K nig: PGP Keysigning on FOSDEM'24

I'm going to FOSDEM'24. Assuming to meet Debian and Kernel folks there, this should be a good opportunity to do PGP keysigning. If you also go there and you're interested in keysigning: Send me your key via email to fosdem24-keysigning@kleine-koenig.org. I'll collect the keys, create a paper list for a keysigning party and send it back to you in the week before FOSDEM. The list will only be made available to other participants. Then maybe wear a "keysigning" badge (or a crepe tape with that caption) that allows us to identify others on the list. If there are not too many who are interested, I guess that should work fine. (If this idea becomes too successful, I'd close the list after the first 100 participants.) Update: Several people let me know there is another keysigning announced on Ludovic Hirlimann's blog. So you might want to bring some more paper slips with your fingerprint and show up there, too.

16 November 2023

Dimitri John Ledkov: Ubuntu 23.10 significantly reduces the installed kernel footprint


Photo by Pixabay
Ubuntu systems typically have up to 3 kernels installed, before they are auto-removed by apt on classic installs. Historically the installation was optimized for metered download size only. However, kernel size growth and usage no longer warrant such optimizations. During the 23.10 Mantic Minatour cycle, I led a coordinated effort across multiple teams to implement lots of optimizations that together achieved unprecedented install footprint improvements.

Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:

  • 2x less disk space used (1,417MB vs 2,940MB, including initrd)

  • 3x less peak RAM usage for the initrd boot (68MB vs 204MB)

  • 0.5x increase in download size (949MB vs 600MB)

  • 2.5x faster initrd generation (4.5s vs 11.3s)

  • approximately the same total time (103s vs 98s, hardware dependent)


For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:

  • 1.3x less disk space used (548MB vs 742MB)

  • 2.2x less peak RAM usage for initrd boot (27MB vs 62MB)

  • 0.4x increase in download size (207MB vs 146MB)


Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.

This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.

The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.

The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.

Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.

Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:

For questions and comments please post to Kernel section on Ubuntu Discourse.



10 September 2023

Freexian Collaborators: Debian Contributions: /usr-merge updates, Salsa CI progress, DebConf23 lead-up, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

/usr-merge work, by Helmut Grohne, et al. Given that we now have consensus on moving forward by moving aliased files from / to /usr, we will also run into the problems that the file move moratorium was meant to prevent. The way forward is detecting them early and applying workarounds on a per-package basis. Said detection is now automated using the Debian Usr Merge Analysis Tool. As problems are reported to the bug tracking system, they are connected to the reports if properly usertagged. Bugs and patches for problem categories DEP17-P2 and DEP17-P6 have been filed. After consensus has been reached on the bootstrapping matters, debootstrap has been changed to swap the initial unpack and merging to avoid unpack errors due to pre-existing links. This is a precondition for having base-files install the aliasing symbolic links eventually. It was identified that the root filesystem used by the Debian installer is still unmerged and a change has been proposed. debhelper was changed to recognize systemd units installed to /usr. A discussion with the CTTE and release team on repealing the moratorium has been initiated.

Salsa CI work, by Santiago Ruano Rinc n August was a busy month in the Salsa CI world. Santiago reviewed and merged a bunch of MRs that have improved the project in different aspects: The aptly job got two MRs from Philip Hands. With the first one, the aptly now can export a couple of variables in a dotenv file, and with the second, it can include packages from multiple artifact directories. These MRs bring the base to improve how to test reverse dependencies with Salsa CI. Santiago is working on documenting this. As a result of the mass bug filing done in August, Salsa CI now includes a job to test how a package builds twice in a row. Thanks to the MRs of Sebastiaan Couwenberg and Johannes Schauer Marin Rodrigues. Last but not least, Santiago helped Johannes Schauer Marin Rodrigues to complete the support for arm64-only pipelines.

DebConf23 lead-up, by Stefano Rivera Stefano wears a few hats in the DebConf organization and in the lead up to the conference in mid-September, they ve all been quite busy. As one of the treasurers of DebConf 23, there has been a final budget update, and quite a few payments to coordinate from Debian s Trusted Organizations. We try to close the books from the previous conference at the next one, so a push was made to get DebConf 22 account statements out of TOs and record them in the conference ledger. As a website developer, we had a number of registration-related tasks, emailing attendees and trying to estimate numbers for food and accommodation. As a conference committee member, the job was mostly taking calls and helping the local team to make decisions on urgent issues. For example, getting conference visas issued to attendees required getting political approval from the Indian government. We only discovered the full process for this too late to clear some complex cases, so this required some hard calls on skipping some countries from the application list, allowing everyone else to get visas in time. Unfortunate, but necessary.

Miscellaneous contributions
  • Rapha l Hertzog updated gnome-shell-extension-hamster to a new upstream git snapshot that is compatible with GNOME Shell 44 that was recently uploaded to Debian unstable/testing. This extension makes it easy to start/stop tracking time with Hamster Time Tracker. Very handy for consultants like us who are billing their work per hour.
  • Rapha l also updated zim to the latest upstream release (0.74.2). This is a desktop wiki that can be very useful as a note-taking tool to build your own personal knowledge base or even to manage your personal todo lists.
  • Utkarsh reviewed and sponsored some uploads from mentors.debian.net.
  • Utkarsh helped the local team and the bursary team with some more DebConf activities and helped finalize the data.
  • Thorsten tried to update package hplip. Unfortunately upstream added some new compressed files that need to appear uncompressed in the package. Even though this sounded like an easy task, which seemed to be already implemented in the current debian/rules, the new type of files broke this implementation and made the package no longer buildable. The problem has been solved and the upload will happen soon.
  • Helmut sent 7 patches for cross build failures. Since dpkg-buildflags now defaults to issue arm64-specific compiler flags, more care is needed to distinguish between build architecture flags and host architecture flags than previously.
  • Stefano pushed the final bit of the tox 4 transition over the line in Debian, allowing dh-python and tox 4 to migrate to testing. We got caught up in a few unusual bugs in tox and the way we run it in Debian package building (which had to change with tox 4). This resulted in a couple of patches upstream.
  • Stefano visited Haifa, Israel, to see the proposed DebConf 24 venue and meet with the local team. While the venue isn t committed yet, we have high hopes for it.

4 August 2023

Reproducible Builds: Reproducible Builds in July 2023

Welcome to the July 2023 report from the Reproducible Builds project. In our reports, we try to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.
Marcel Fourn et al. presented at the IEEE Symposium on Security and Privacy in San Francisco, CA on The Importance and Challenges of Reproducible Builds for Software Supply Chain Security. As summarised in last month s report, the abstract of their paper begins:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. (PDF)

Chris Lamb published an interview with Simon Butler, associate senior lecturer in the School of Informatics at the University of Sk vde, on the business adoption of Reproducible Builds. (This is actually the seventh instalment in a series featuring the projects, companies and individuals who support our project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler.) Vagrant Cascadian presented Breaking the Chains of Trusting Trust at FOSSY 2023.
Rahul Bajaj has been working with Roland Clobus on merging an overview of environment variations to our website:
I have identified 16 root causes for unreproducible builds in my empirical study, which I have linked to the corresponding documentation. The initial MR right now contains information about 10 root causes. For each root cause, I have provided a definition, a notable instance, and a workaround. However, I have only found workarounds for 5 out of the 10 root causes listed in this merge request. In the upcoming commits, I plan to add an additional 6 root causes. I kindly request you review the text for any necessary refinements, modifications, or corrections. Additionally, I would appreciate the help with documentation for the solutions/workarounds for the remaining root causes: Archive Metadata, Build ID, File System Ordering, File Permissions, and Snippet Encoding. Your input on the identified root causes for unreproducible builds would be greatly appreciated. [ ]

Just a reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page which has more details about the event and location.
There was more progress towards making the Go programming language ecosystem reproducible this month, including: In addition, kpcyrd posted to our mailing list to report that:
while packaging govulncheck for Arch Linux I noticed a checksum mismatch for a tar file I downloaded from go.googlesource.com. I used diffoscope to compare the .tar file I downloaded with the .tar file the build server downloaded, and noticed the timestamps are different.

In Debian, 20 reviews of Debian packages were added, 25 were updated and 25 were removed this month adding to our knowledge about identified issues. A number of issue types were updated, including marking ffile_prefix_map_passed_to_clang being fixed since Debian bullseye [ ] and adding a Debian bug tracker reference for the nondeterminism_added_by_pyqt5_pyrcc5 issue [ ]. In addition, Roland Clobus posted another detailed update of the status of reproducible Debian ISO images on our mailing list. In particular, Roland helpfully summarised that live images are looking good, and the number of (passing) automated tests is growing .
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.
F-Droid added 20 new reproducible apps in July, making 165 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
The Sphinx documentation tool recently accepted a change to improve deterministic reproducibility of documentation. It s internal util.inspect.object_description attempts to sort collections, but this can fail. The change handles the failure case by using string-based object descriptions as a fallback deterministic sort ordering, as well as adding recursive object-description calls for list and tuple datatypes. As a result, documentation generated by Sphinx will be more likely to be automatically reproducible. Lastly in news, kpcyrd posted to our mailing list announcing a new repro-env tool:
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
In diffoscope development this month, versions 244, 245 and 246 were uploaded to Debian unstable by Chris Lamb, who also made the following changes:
  • Don t include the file size in image metadata. It is, at best, distracting, and it is already in the directory metadata. [ ]
  • Add compatibility with libarchive-5. [ ]
  • Mark that the test_dex::test_javap_14_differences test requires the procyon tool. [ ]
  • Initial work on DOS/MBR extraction. [ ]
  • Move to using assert_diff in the .ico and .jpeg tests. [ ]
  • Temporarily mark some Android-related as XFAIL due to Debian bugs #1040941 & #1040916. [ ]
  • Fix the test skipped reason generation in the case of a version outside of the required range. [ ]
  • Update copyright years. [ ][ ]
  • Fix try.diffoscope.org. [ ]
In addition, Gianfranco Costamagna added support for LLVM version 16. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In July, a number of changes were made by Holger Levsen:
  • General changes:
    • Upgrade Jenkins host to Debian bookworm now that Debian 12.1 is out. [ ][ ][ ][ ]
    • djm: improve UX when rebooting a node fails. [ ]
    • djm: reduce wait time between rebooting nodes. [ ]
  • Debian-related changes:
    • Various refactoring of the Debian scheduler. [ ][ ][ ]
    • Make Debian live builds more robust with respect to salsa.debian.org returning HTTP 502 errors. [ ][ ]
    • Use the legacy SCP protocol instead of the SFTP protocol when transfering Debian live builds. [ ][ ]
    • Speed up a number of database queries thanks, Myon! [ ][ ][ ][ ][ ]
    • Split create_meta_pkg_sets job into two (for Debian unstable and Debian testing) to half the job runtime to approximately 90 minutes. [ ][ ]
    • Split scheduler job into four separate jobs, one for each tested architecture. [ ][ ]
    • Treat more PostgreSQL errors as serious (for some jobs). [ ]
    • Re-enable automatic database documentation now that postgresql_autodoc is back in Debian bookworm. [ ]
    • Remove various hardcoding of Debian release names. [ ]
    • Drop some i386 special casing. [ ]
  • Other distributions:
    • Speed up Alpine SQL queries. [ ]
    • Adjust CSS layout for Arch Linux pages to match 3 and not 4 repos being tested. [ ]
    • Drop the community Arch Linux repo as it has now been merged into the extra repo. [ ]
    • Speed up a number of Arch-related database queries. [ ]
    • Try harder to properly cleanup after building OpenWrt packages. [ ]
    • Drop all kfreebsd-related tests now that it s officially dead. [ ]
  • System health:
    • Always ignore some well-known harmless orphan processes. [ ][ ][ ]
    • Detect another case of job failure due to Jenkins shutdown. [ ]
    • Show all non co-installable package sets on the status page. [ ]
    • Warn that some specific reboot nodes are currently false-positives. [ ]
  • Node health checks:
    • Run system and node health checks for Jenkins less frequently. [ ]
    • Try to restart any failed dpkg-db-backup [ ] and munin-node services [ ].
In addition, Vagrant Cascadian updated the paths in our automated to tests to use the same paths used by the official Debian build servers. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 August 2023

Reproducible Builds: Supporter spotlight: Simon Butler on business adoption of Reproducible Builds

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the seventh instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler. Today, however, we will be talking with Simon Butler, an associate senior lecturer in the School of Informatics at the University of Sk vde, where he undertakes research in software engineering that focuses on IoT and open source software, and contributes to the teaching of computer science to undergraduates.

Chris: For those who have not heard of it before, can you tell us more about the School of Informatics at Sk vde University? Simon: Certainly, but I may be a little long-winded. Sk vde is a city in the area between the two large lakes in southern Sweden. The city is a busy place. Sk vde is home to the regional hospital, some of Volvo s manufacturing facilities, two regiments of the Swedish defence force, a lot of businesses in the Swedish computer games industry, other tech companies and more. The University of Sk vde is relatively small. Sweden s large land area and low population density mean that regional centres such as Sk vde are important and local universities support businesses by training new staff and supporting innovation. The School of Informatics has two divisions. One focuses on teaching and researching computer games. The other division encompasses a wider range of teaching and research, including computer science, web development, computer security, network administration, data science and so on.
Chris: You recently had a open-access paper published in Software Quality Journal. Could you tell us a little bit more about it and perhaps briefly summarise its key findings? Simon: The paper is one output of a collaborative research project with six Swedish businesses that use open source software. There are two parts to the paper. The first consists of an analysis of what the group of businesses in the project know about Reproducible Builds (R-Bs), their experiences with R-Bs and their perception of the value of R-Bs to the businesses. The second part is an interview study with business practitioners and others with experience and expertise in R-Bs. We set out to try to understand the extent to which software-intensive businesses were aware of R-Bs, the technical and business reasons they were or were not using R-Bs and to document the business and technical use cases for R-Bs. The key findings were that businesses are aware of R-Bs, and some are using R-Bs as part of their day-to-day development process. Some of the uses for R-Bs we found were not previously documented. We also found that businesses understood the value R-Bs have as part of engineering and software quality processes. They are also aware of the costs of implementing R-Bs and that R-Bs are an intangible value proposition - in other words, businesses can add value through process improvement by using R-Bs. But, that, currently at least, R-Bs are not a selling point for software or products.
Chris: You performed a large number of interviews in order to prepare your paper. What was the most surprising response to you? Simon: Most surprising is a good question. Everybody I spoke to brought something new to my understanding of R-Bs, and many responses surprised me. The interviewees that surprised me most were I01 and I02 (interviews were anonymised and interviewees were assigned numeric identities). I02 described the sceptical perspective that there is a viable, pragmatic alternative to R-Bs - verifiable builds - which I was aware of before undertaking the research. The company had developed a sufficiently robust system for their needs and worked well. With a large archive of software used in production, they couldn t justify the cost of retrofitting a different solution that might only offer small advantages over the existing system. Doesn t really sound too surprising, but the interview was one of the first I did on this topic, and I was very focused on the value of, and need for, trust in a system that motivated the R-B. The solution used by the company requires trust, but they seem to have established sufficient trust for their needs by securing their build systems to the extent that they are more or less tamper-proof. The other big surprise for me was I01 s use of R-Bs to support the verification of system configuration in a system with multiple embedded components at boot time. It s such an obvious application of R-Bs, and exactly the kind of response I hoped to get from interviewees. However, it is another instance of a solution where trust is only one factor. In the first instance, the developer is using R-Bs to establish trust in the toolchain. There is also the second application that the developer can use a set of R-Bs to establish that deployed system consists of compatible components. While this might not sound too significant, there appear to be some important potential applications. One that came to mind immediately is a problem with firmware updates on nodes in IoT systems where the node needs to update quickly with limited downtime and without failure. The node also needs to be able to roll back any update proposed by a server if there are conflicts with the current configuration or if any tests on the node fail. Perhaps the chances of failure could be reduced, if a node can instead negotiate with a server to determine a safe path to migrate from its current configuration to a working configuration with the upgraded components the central system requires? Another potential application appears to be in the configuration management of AI systems, where decisions need to be explainable. A means of specifying validated configurations of training data, models and deployed systems might, perhaps, be leveraged to prevent invalid or broken configurations from being deployed in production.
Chris: One of your findings was that reproducible builds were perceived to be good engineering practice . To what extent do you believe cultural forces affect the adoption or rejection of a given technology or practice? Simon: To a large extent. People s decisions are informed by cultural norms, and business decisions are made by people acting collectively. Of course, decision-making, including assessments of risk and usefulness, is mediated by individual positions on the continuum from conformity to non-conformity, as well as individual and in-group norms. Whether a business will consider a given technology for adoption will depend on cultural forces. The decision to adopt may well be made on the grounds of cost and benefits.
Chris: Another conclusion implied by your research is that businesses are often dealing with software deployment lifespans (eg. 20+ years) that differ from widely from those of the typical hobbyist programmer. To what degree do you think this temporal mismatch is a problem for both groups? Simon: This is a fascinating question. Long-term software maintenance is a requirement in some industries because of the working lifespans of the products and legal requirements to maintain the products for a fixed period. For some other industries, it is less of a problem. Consequently, I would tend to divide developers into those who have been exposed to long-term maintenance problems and those who have not. Although, more professional than hobbyist developers will have been exposed to the problem. Nonetheless, there are areas, such as music software, where there are also long-term maintenance challenges for data formats and software.
Chris: Based on your research, what would you say are the biggest blockers for the adoption of reproducible builds within business ? And, based on this, would you have any advice or recommendations for the broader reproducible builds ecosystem? Simon: From the research, the main blocker appears to be cost. Not an absolute cost, but there is an overhead to introducing R-Bs. Businesses (and thus business managers) need to understand the business case for R-Bs. Making decision-makers in businesses aware of R-Bs and that they are valuable will take time. Advocacy at multiple levels appears to be the way forward and this is being done. I would recommend being persistent while being patient and to keep talking about reproducible builds. The work done in Linux distributions raises awareness of R-Bs amongst developers. Guix, NixOS and Software Heritage are all providing practical solutions and getting attention - I ve been seeing progressively more mentions of all three during the last couple of years. Increased awareness amongst developers should lead to more interest within companies. There is also research money being assigned to supply chain security and R-B s. The CHAINS project at KTH in Stockholm is one example of a strategic research project. There may be others that I m not aware of. The policy-level advocacy is slowly getting results in some countries, and where CISA leads, others may follow.
Chris: Was there a particular reason you alighted on the question of the adoption of reproducible builds in business? Do you think there s any truth behind the shopworn stereotype of hacker types neglecting the resources that business might be able to offer? Simon: Much of the motivation for the research came from the contrast between the visibility of R-Bs in open source projects and the relative invisibility of R-Bs in industry. Where companies are known to be using R-Bs (e.g. Google, etc.) there is no fuss, no hype. They were not selling R-Bs as a solution; instead the documentation is very matter-of-fact that R-Bs are part of a customer-facing process in their cloud solutions. An obvious question for me was that if some people use R-B s in software development, why doesn t everybody? There are limits to the tooling for some programming languages that mean R-Bs are difficult or impossible. But where creating an R-B is practical, why are they not used more widely? So, to your second question. There is another factor, which seems to be more about a lack of communication rather than neglecting opportunities. Businesses may not always be willing to discuss their development processes and innovations. Though I do think the increasing number of conferences (big and small) for software practitioners is helping to facilitate more communication and greater exchange of ideas.
Chris: Has your personal view of reproducible builds changed since before you embarked on writing this paper? Simon: Absolutely! In the early stages of the research, I was interested in questions of trust and how R-Bs were applied to resolve build and supply chain security problems. As the research developed, however, I started to see there were benefits to the use of R-Bs that were less obvious and that, in some cases, an R-B can have more than a single application.
Chris: Finally, do you have any plans to do future research touching on reproducible builds? Simon: Yes, definitely. There are a set of problems that interest me. One already mentioned is the use of reproducible builds with AI systems. Interpretable or explainable AI (XAI) is a necessity, and I think that R-Bs can be used to support traceability in the configuration and testing of both deployed systems and systems used during model training and evaluation. I would also like to return to a problem discussed briefly in the article, which is to develop a deeper understanding of the elements involved in the application of R-Bs that can be used to support reasoning about existing and potential applications of R-Bs. For example, R-Bs can be used to establish trust for different groups of individuals at different times, say, between remote developers prior to the release of software and by users after release. One question is whether when an R-B is used might be a significant factor. Another group of questions concerns the ways in which trust (of some sort) propagates among users of an R-B. There is an example in the paper of a company that rebuilds Debian reproducibly for security reasons and is then able to collaborate on software projects where software is built reproducibly with other companies that use public distributions of Debian.
Chris: Many thanks for this interview, Simon. If someone wanted to get in touch or learn more about you and your colleagues at the School of Informatics, where might they go? Thank you for the opportunity. It has been a pleasure to reflect a little more widely on the research! Personally, you can find out about my work on my official homepage and on my personal site. The software systems research group (SSRG) has a website, and the University of Sk vde s English language pages are also available. Chris: Many thanks for this interview, Simon!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

29 June 2023

C.J. Collier: Converting a windows install to a libvirt VM

Reduce the size of your c: partition to the smallest it can be and then turn off windows with the understanding that you will never boot this system on the iron ever again.
Boot into a netinst installer image (no GUI). hold alt and press left arrow a few times until you get to a prompt to press enter. Press enter. In this example /dev/sda is your windows disk which contains the c: partition
and /dev/disk/by-id/usb0 is the USB-3 attached SATA controller that you have your SSD attached to (please find an example attached). This SSD should be equal to or larger than the windows disk for best compatability. A photo of a USB-3 attached SATA controller To find the literal path names of your detected drives you can run fdisk -l. Pay attention to the names of the partitions and the sizes of the drives to help determine which is which. Once you have a shell in the netinst installer, you should maybe be able to run a command like the following. This will duplicate the disk located at if (in file) to the disk located at of (out file) while showing progress as the status.
dd if=/dev/sda of=/dev/disk/by-id/usb0 status=progress
If you confirm that dd is available on the netinst image and the previous command runs successfully, test that your windows partition is visible in the new disk s partition table. The start block of the windows partition on each should match, as should the partition size.
fdisk -l /dev/disk/by-id/usb0
fdisk -l /dev/sda
If the output from the first is the same as the output from the second, then you are probably safe to proceed. Once you confirm that you have made and tested a full copy of the blocks from your windows drive saved on your usb disk, nuke your windows partition table from orbit.
dd if=/dev/zero of=/dev/sda bs=1M count=42
You can press alt-f1 to return to the Debian installer now. Follow the instructions to install Debian. Don t forget to remove all attached USB drives. Once you install Debian, press ctrl-alt-f3 to get a root shell. Add your user to the sudoers group:
# adduser cjac sudoers
log out
# exit
log in as your user and confirm that you have sudo
$ sudo ls
Don t forget to read the spider man advice enter your password you ll need to install virt-manager. I think this should help:
$ sudo apt-get install virt-manager libvirt-daemon-driver-qemu qemu-system-x86
insert the USB drive. You can now create a qcow2 file for your virtual machine.
$ sudo qemu-img convert -O qcow2 \
/dev/disk/by-id/usb0 \
/var/lib/libvirt/images/windows.qcow2
I personally create a volume group called /dev/vg00 for the stuff I want to run raw and instead of converting to qcow2 like all of the other users do, I instead write it to a new logical volume.
sudo lvcreate /dev/vg00 -n windows -L 42G # or however large your drive was
sudo dd if=/dev/disk/by-id/usb0 of=/dev/vg00/windows status=progress
Now that you ve got the qcow2 file created, press alt-left until you return to your GDM session. The apt-get install command above installed virt-manager, so log in to your system if you haven t already and open up gnome-terminal by pressing the windows key or moving your mouse/gesture to the top left of your screen. Type in gnome-terminal and either press enter or click/tap on the icon. I like to run this full screen so that I feel like I m in a space ship. If you like to feel like you re in a spaceship, too, press F11. You can start virt-manager from this shell or you can press the windows key and type in virt-manager and press enter. You ll want the shell to run commands such as virsh console windows or virsh list When virt-manager starts, right click on QEMU/KVM and select New.
In the New VM window, select Import existing disk image
When prompted for the path to the image, use the one we created with sudo qemu-img convert above.
Select the version of Windows you want.
Select memory and CPUs to allocate to the VM.
Tick the Customize configuration before install box
If you re prompted to enable the default network, do so now.
The default hardware layout should probably suffice. Get it as close to the underlying hardware as it is convenient to do. But Windows is pretty lenient these days about virtualizing licensed windows instances so long as they re not running in more than one place at a time. Good luck! Leave comments if you have questions.

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month: Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:
  • Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml file [ ].
  • Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian
  • Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
  • Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
  • 20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
  • Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope development diffoscope version 241 was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
  • Holger Levsen:
    • Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]
    • Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]
    • In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]
    • Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [ ][ ][ ]
    • Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]
    • Update the node health checks to detect failures to end schroot sessions. [ ]
    • Filter out another duplicate contributor from the contributor statistics. [ ]
  • Mattia Rizzolo:



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

22 January 2023

Dirk Eddelbuettel: Rcpp 1.0.10 on CRAN: Regular Update

rcpp logo The Rcpp team is thrilled to announce the newest release 1.0.10 of the Rcpp package which is hitting CRAN now and will go to Debian shortly. Windows and macOS builds should appear at CRAN in the next few days, as will builds in different Linux distribution and of course at r2u. The release was prepared a few days ago, but given the widespread use at CRAN it took a few days to be processed. As always, our sincere thanks to the CRAN maintainers Uwe Ligges and Kurt Hornik. This release continues with the six-months cycle started with release 1.0.5 in July 2020. As a reminder, we do of course make interim snapshot dev or rc releases available via the Rcpp drat repo and strongly encourage their use and testing I run my systems with these versions which tend to work just as well, and are also fully tested against all reverse-dependencies. Rcpp has become the most popular way of enhancing R with C or C++ code. Right now, around 2623 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 252 in BioConductor. On CRAN, 13.7% of all packages depend (directly) on CRAN, and 58.7% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 67.1 million times. This release is incremental as usual, preserving existing capabilities faithfully while smoothing our corners and / or extending slightly. Of particular note is the now fully-enabled use of the unwind protection making some operations a little faster by default; special thanks to I aki for spearheading this. Kevin and I also polished a few other bugs off as detailed below. The full list of details follows.

Changes in Rcpp release version 1.0.10 (2023-01-12)
  • Changes in Rcpp API:
    • Unwind protection is enabled by default (I aki in #1225). It can be disabled by defining RCPP_NO_UNWIND_PROTECT before including Rcpp.h. RCPP_USE_UNWIND_PROTECT is not checked anymore and has no effect. The associated plugin unwindProtect is therefore deprecated and will be removed in a future release.
    • The 'finalize' method for Rcpp Modules is now eagerly materialized, fixing an issue where errors can occur when Module finalizers are run (Kevin in #1231 closing #1230).
    • Zero-row data.frame objects can receive push_back or push_front (Dirk in #1233 fixing #1232).
    • One remaining sprintf has been replaced by snprintf (Dirk and Kevin in #1236 and #1237).
    • Several conversion warnings found by clang++ have been addressed (Dirk in #1240 and #1241).
  • Changes in Rcpp Attributes:
    • The C++20, C++2b (experimental) and C++23 standards now have plugin support like the other C++ standards (Dirk in #1228).
    • The source path for attributes received one more protection from spaces (Dirk in #1235 addressing #1234).
  • Changes in Rcpp Deployment:
    • Several GitHub Actions have been updated.

Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2932 previous questions. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

7 January 2023

Reproducible Builds: Reproducible Builds in December 2022

Welcome to the December 2022 report from the Reproducible Builds project.
We are extremely pleased to announce that the dates for the Reproducible Builds Summit in 2023 have been announced in 2022 already: We plan to spend three days continuing to the grow of the Reproducible Builds effort. As in previous events, the exact content of the meeting will be shaped by the participants. And, as mentioned in Holger Levsen s post to our mailing list, the dates have been booked and confirmed with the venue, so if you are considering attending, please reserve these dates in your calendar today.
R my Gr nblatt, an associate professor in the T l com Sud-Paris engineering school wrote up his pain points of using Nix and NixOS. Although some of the points do not touch on reproducible builds, R my touches on problems he has encountered with the different kinds of reproducibility that these distributions appear to promise including configuration files affecting the behaviour of systems, the fragility of upstream sources as well as the conventional idea of binary reproducibility.
Morten Linderud reported that he is quietly optimistic that if Go programming language resolves all of its issues with reproducible builds (tracking issue) then the Go binaries distributed from Google and by Arch Linux may be bit-for-bit identical. It s just a bit early to sorta figure out what roadblocks there are. [But] Go bootstraps itself every build, so in theory I think it should be possible.
On December 15th, Holger Levsen published an in-depth interview he performed with David A. Wheeler on supply-chain security and reproducible builds, but it also touches on the biggest challenges in computing as well. This is part of a larger series of posts featuring the projects, companies and individuals who support the Reproducible Builds project. Other instalments include an article featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project.
A number of changes were made to the Reproducible Builds website and documentation this month, including FC Stegerman adding an F-Droid/apksigcopier example to our embedded signatures page [ ], Holger Levsen making a large number of changes related to the 2022 summit in Venice as well as 2023 s summit in Hamburg [ ][ ][ ][ ] and Simon Butler updated our publications page [ ][ ].
On our mailing list this month, James Addison asked a question about whether there has been any effort to trace the files used by a build system in order to identify the corresponding build-dependency packages. [ ] In addition, Bernhard M. Wiedemann then posed a thought-provoking question asking How to talk to skeptics? , which was occasioned by a colleague who had published a blog post in May 2021 skeptical of reproducible builds. The thread generated a number of replies.

Android news obfusk (FC Stegerman) performed a thought-provoking review of tools designed to determine the difference between two different .apk files shipped by a number of free-software instant messenger applications. These scripts are often necessary in the Android/APK ecosystem due to these files containing embedded signatures so the conventional bit-for-bit comparison cannot be used. After detailing a litany of issues with these tools, they come to the conclusion that:
It s quite possible these messengers actually have reproducible builds, but the verification scripts they use don t actually allow us to verify whether they do.
This reflects the consensus view within the Reproducible Builds project: pursuing a situation in language or package ecosystems where binaries are bit-for-bit identical (over requiring a bespoke ecosystem-specific tool) is not a luxury demanded by purist engineers, but rather the only practical way to demonstrate reproducibility. obfusk also announced the first release of their own set of tools on our mailing list. Related to this, obfusk also posted to an issue filed against Mastodon regarding the difficulties of creating bit-by-bit identical APKs, especially with respect to copying v2/v3 APK signatures created by different tools; they also reported that some APK ordering differences were not caused by building on macOS after all, but by using Android Studio [ ] and that F-Droid added 16 more apps published with Reproducible Builds in December.

Debian As mentioned in last months report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). During December, meetings were held on the 1st, 8th, 15th, 22nd and 29th, resulting in a large number of uploads and bugs being addressed: The next sprint is due to take place this coming Tuesday, January 10th at 16:00 UTC.

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • The osuosl167 machine is no longer a openqa-worker node anymore. [ ][ ]
  • Detect problems with APT repository signatures [ ] and update a repository signing key [ ].
  • reproducible Debian builtin-pho: improve job output. [ ]
  • Only install the foot-terminfo package on Debian systems. [ ]
In addition, Mattia Rizzolo added support for the version of diffoscope in Debian stretch which doesn t support the --timeout flag. [ ][ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 228, 229 and 230 to Debian:
  • Fix compatibility with file(1) version 5.43, with thanks to Christoph Biedl. [ ]
  • Skip the test_html.py::test_diff test if html2text is not installed. (#1026034)
  • Update copyright years. [ ]
In addition, Jelle van der Waa added support for Berkeley DB version 6. [ ] Orthogonal to this, Holger Levsen bumped the Debian Standards-Version on all of our packages, including diffoscope [ ], strip-nondeterminism [ ], disorderfs [ ] and reprotest [ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via:

15 December 2022

Reproducible Builds: Supporter spotlight: David A. Wheeler on supply chain security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the sixth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project. Today, however, we will be talking with David A. Wheeler, the Director of Open Source Supply Chain Security at the Linux Foundation.

Holger Levsen: Welcome, David, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself? David: Sure! I m David A. Wheeler and I work for the Linux Foundation as the Director of Open Source Supply Chain Security. That just means that my job is to help open source software projects improve their security, including its development, build, distribution, and incorporation in larger works, all the way out to its eventual use by end-users. In my copious free time I also teach at George Mason University (GMU); in particular, I teach a graduate course on how to design and implement secure software. My background is technical. I have a Bachelor s in Electronics Engineering, a Master s in Computer Science and a PhD in Information Technology. My PhD dissertation is connected to reproducible builds. My PhD dissertation was on countering the Trusting Trust attack, an attack that subverts fundamental build system tools such as compilers. The attack was discovered by Karger & Schell in the 1970s, and later demonstrated & popularized by Ken Thompson. In my dissertation on trusting trust I showed that a process called Diverse Double-Compiling (DDC) could detect trusting trust attacks. That process is a specialized kind of reproducible build specifically designed to detect trusting trust style attacks. In addition, countering the trusting trust attack primarily becomes more important only when reproducible builds become more common. Reproducible builds enable detection of build-time subversions. Most attackers wouldn t bother with a trusting trust attack if they could just directly use a build-time subversion of the software they actually want to subvert.
Holger: Thanks for taking the time to introduce yourself to us. What do you think are the biggest challenges today in computing? There are many big challenges in computing today. For example:
Holger: Do you think reproducible builds are an important part in secure computing today already? David: Yes, but first let s put things in context. Today, when attackers exploit software vulnerabilities, they re primarily exploiting unintentional vulnerabilities that were created by the software developers. There are a lot of efforts to counter this: We re just starting to get better at this, which is good. However, attackers always try to attack the easiest target. As our deployed software has started to be hardened against attack, attackers have dramatically increased their attacks on the software supply chain (Sonatype found in 2022 that there s been a 742% increase year-over-year). The software supply chain hasn t historically gotten much attention, making it the easy target. There are simple supply chain attacks with simple solutions: Unfortunately, attackers know there are other lines of attack. One of the most dangerous is subverted build systems, as demonstrated by the subversion of SolarWinds Orion system. In a subverted build system, developers can review the software source code all day and see no problem, because there is no problem there. Instead, the process to convert source code into the code people run, called the build system , is subverted by an attacker. One solution for countering subverted build systems is to make the build systems harder to attack. That s a good thing to do, but you can never be confident that it was good enough . How can you be sure it s not subverted, if there s no way to know? A stronger defense against subverted build systems is the idea of verified reproducible builds. A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts. A build is verified if multiple different parties verify that they get the same result for that situation. When you have a verified reproducible build, either all the parties colluded (and you could always double-check it yourself), or the build process isn t subverted. There is one last turtle: What if the build system tools or machines are subverted themselves? This is not a common attack today, but it s important to know if we can address them when the time comes. The good news is that we can address this. For some situations reproducible builds can also counter such attacks. If there s a loop (that is, a compiler is used to generate itself), that s called the trusting trust attack, and that is more challenging. Thankfully, the trusting trust attack has been known about for decades and there are known solutions. The diverse double-compiling (DDC) process that I explained in my PhD dissertation, as well as the bootstrappable builds process, can both counter trusting trust attacks in the software space. So there is no reason to lose hope: there is a bottom turtle , as it were.
Holger: Thankfully, this has all slowly started to change and supply chain issues are now widely discussed, as evident by efforts like Securing the Software Supply Chain: Recommended Practices Guide for Developers which you shared on our mailing list. In there, Reproducible Builds are mentioned as recommended advanced practice, which is both pretty cool (we ve come a long way!), but to me it also sounds like this will take another decade until it s become standard normal procedure. Do you agree on that timeline? David: I don t think there will be any particular timeframe. Different projects and ecosystems will move at different speeds. I wouldn t be surprised if it took a decade or so for them to become relatively common there are good reasons for that. Today the most common kinds of attacks based on software vulnerabilities still involve unintentional vulnerabilities in operational systems. Attackers are starting to apply supply chain attacks, but the top such attacks today are typosquatting (creating packages with similar names) and dependency confusion) (convincing projects to download packages from the wrong repositories). Reproducible builds don t counter those kinds of attacks, they counter subverted builds. It s important to eventually have verified reproducible builds, but understandably other issues are currently getting prioritized first. That said, reproducible builds are important long term. Many people are working on countering unintentional vulnerabilities and the most common kinds of supply chain attacks. As these other threats are countered, attackers will increasingly target build systems. Attackers always go for the weakest link. We will eventually need verified reproducible builds in many situations, and it ll take a while to get build systems able to widely perform reproducible builds, so we need to start that work now. That s true for anything where you know you ll need it but it will take a long time to get ready you need to start now.
Holger: What are your suggestions to accelerate adoption? David: Reproducible builds need to be: I think there s a snowball effect. Once many projects packages are reproducible, it will be easier to convince other projects to make their packages reproducible. I also think there should be some prioritization. If a package is in wide use (e.g., part of minimum set of packages for a widely-used Linux distribution or framework), its reproducibility should be a special focus. If a package is vital for supporting some societally important critical infrastructure (e.g., running dams), it should also be considered important. You can then work on the ones that are less important over time.
Holger: How is the Best Practices Badge going? How many projects are participating and how many are missing? David: It s going very well. You can see some automatically-generated statistics, showing we have over 5,000 projects, adding more than 1/day on average. We have more than 900 projects that have earned at least the passing badge level.
Holger: How many of the projects participating in the Best Practices badge engaging with reproducible builds? David: As of this writing there are 168 projects that report meeting the reproducible builds criterion. That s a relatively small percentage of projects. However, note that this criterion (labelled build_reproducible) is only required for the gold badge. It s not required for the passing or silver level badge. Currently we ve been strategically focused on getting projects to at least earn a passing badge, and less on earning silver or gold badges. We would love for all projects to get earn a silver or gold badge, of course, but our theory is that projects that can t even earn a passing badge present the most risk to their users. That said, there are some projects we especially want to see implementing higher badge levels. Those include projects that are very widely used, so that vulnerabilities in them can impact many systems. Examples of such projects include the Linux kernel and curl. In addition, some projects are used within systems where it s important to society that they not have serious security vulnerabilities. Examples include projects used by chemical manufacturers, financial systems and weapons. We definitely encourage any of those kinds of projects to earn higher badge levels.
Holger: Many thanks for this interview, David, and for all of your work at the Linux Foundation and elsewhere!




For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

11 November 2022

Reproducible Builds: Reproducible Builds in October 2022

Welcome to the Reproducible Builds report for October 2022! In these reports we attempt to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Our in-person summit this year was held in the past few days in Venice, Italy. Activity and news from the summit will therefore be covered in next month s report!
A new article related to reproducible builds was recently published in the 2023 IEEE Symposium on Security and Privacy. Titled Taxonomy of Attacks on Open-Source Software Supply Chains and authored by Piergiorgio Ladisa, Henrik Plate, Matias Martinez and Olivier Barais, their paper:
[ ] proposes a general taxonomy for attacks on opensource supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution.
Taking the form of an attack tree, the paper covers 107 unique vectors linked to 94 real world supply-chain incidents which is then mapped to 33 mitigating safeguards including, of course, reproducible builds:
Reproducible Builds received a very high utility rating (5) from 10 participants (58.8%), but also a high-cost rating (4 or 5) from 12 (70.6%). One expert commented that a reproducible build like used by Solarwinds now, is a good measure against tampering with a single build system and another claimed this is going to be the single, biggest barrier .

It was noticed this month that Solarwinds published a whitepaper back in December 2021 in order to:
[ ] illustrate a concerning new reality for the software industry and illuminates the increasingly sophisticated threats made by outside nation-states to the supply chains and infrastructure on which we all rely.
The 12-month anniversary of the 2020 Solarwinds attack (which SolarWinds Worldwide LLC itself calls the SUNBURST attack) was, of course, the likely impetus for publication.
Whilst collaborating on making the Cyrus IMAP server reproducible, Ellie Timoney asked why the Reproducible Builds testing framework uses two remarkably distinctive build paths when attempting to flush out builds that vary on the absolute system path in which they were built. In the case of the Cyrus IMAP server, these happened to be: Asked why they vary in three different ways, Chris Lamb listed in detail the motivation behind to each difference.
On our mailing list this month:
The Reproducible Builds project is delighted to welcome openEuler to the Involved projects page [ ]. openEuler is Linux distribution developed by Huawei, a counterpart to it s more commercially-oriented EulerOS.

Debian Colin Watson wrote about his experience towards making the databases generated by the man-db UNIX manual page indexing tool:
One of the people working on [reproducible builds] noticed that man-db s database files were an obstacle to [reproducibility]: in particular, the exact contents of the database seemed to depend on the order in which files were scanned when building it. The reporter proposed solving this by processing files in sorted order, but I wasn t keen on that approach: firstly because it would mean we could no longer process files in an order that makes it more efficient to read them all from disk (still valuable on rotational disks), but mostly because the differences seemed to point to other bugs.
Colin goes on to describe his approach to solving the problem, including fixing various fits of internal caching, and he ends his post with None of this is particularly glamorous work, but it paid off .
Vagrant Cascadian announced on our mailing list another online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). The first such sprint took place on September 22nd, but another was held on October 6th, and another small one on October 20th. This resulted in the following progress:
41 reviews of Debian packages were added, 62 were updated and 12 were removed this month adding to our knowledge about identified issues. A number of issue types were updated too. [1][ ]
Lastly, Luca Boccassi submitted a patch to debhelper, a set of tools used in the packaging of the majority of Debian packages. The patch addressed an issue in the dh_installsysusers utility so that the postinst post-installation script that debhelper generates the same data regardless of the underlying filesystem ordering.

Other distributions F-Droid is a community-run app store that provides free software applications for Android phones. This month, F-Droid changed their documentation and guidance to now explicitly encourage RB for new apps [ ][ ], and FC Stegerman created an extremely in-depth issue on GitLab concerning the APK signing block. You can read more about F-Droid s approach to reproducibility in our July 2022 interview with Hans-Christoph Steiner of the F-Droid Project. In openSUSE, Bernhard M. Wiedemann published his usual openSUSE monthly report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 224 and 225 to Debian:
  • Add support for comparing the text content of HTML files using html2text. [ ]
  • Add support for detecting ordering-only differences in XML files. [ ]
  • Fix an issue with detecting ordering differences. [ ]
  • Use the capitalised version of Ordering consistently everywhere in output. [ ]
  • Add support for displaying font metadata using ttx(1) from the fonttools suite. [ ]
  • Testsuite improvements:
    • Temporarily allow the stable-po pipeline to fail in the CI. [ ]
    • Rename the order1.diff test fixture to json_expected_ordering_diff. [ ]
    • Tidy the JSON tests. [ ]
    • Use assert_diff over get_data and an manual assert within the XML tests. [ ]
    • Drop the ALLOWED_TEST_FILES test; it was mostly just annoying. [ ]
    • Tidy the tests/test_source.py file. [ ]
Chris Lamb also added a link to diffoscope s OpenBSD packaging on the diffoscope.org homepage [ ] and Mattia Rizzolo fix an test failure that was occurring under with LLVM 15 [ ].

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • Run the logparse tool to analyse results on the Debian Edu build logs. [ ]
  • Install btop(1) on all nodes running Debian. [ ]
  • Switch Arch Linux from using SHA1 to SHA256. [ ]
  • When checking Debian debstrap jobs, correctly log the tool usage. [ ]
  • Cleanup more task-related temporary directory names when testing Debian packages. [ ][ ]
  • Use the cdebootstrap-static binary for the 2nd runs of the cdebootstrap tests. [ ]
  • Drop a workaround when testing OpenWrt and coreboot as the issue in diffoscope has now been fixed. [ ]
  • Turn on an rm(1) warning into an info -level message. [ ]
  • Special case the osuosl168 node for running Debian bookworm already. [ ][ ]
  • Use the new non-free-firmware suite on the o168 node. [ ]
In addition, Mattia Rizzolo made the following changes:
  • Ensure that 2nd build has a merged /usr. [ ]
  • Only reconfigure the usrmerge package on Debian bookworm and above. [ ]
  • Fix bc(1) syntax in the computation of the percentage of unreproducible packages in the dashboard. [ ][ ][ ]
  • In the index_suite_ pages, order the package status to be the same order of the menu. [ ]
  • Pass the --distribution parameter to the pbuilder utility. [ ]
Finally, Roland Clobus continued his work on testing Live Debian images. In particular, he extended the maintenance script to warn when workspace directories cannot be deleted. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

24 June 2022

Reproducible Builds: Supporter spotlight: Hans-Christoph Steiner of the F-Droid project

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the fifth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST) and Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix. Today, however, we will be talking with Hans-Christoph Steiner from the F-Droid project.

Chris: Welcome, Hans! Could you briefly tell me about yourself? Hans: Sure. I spend most of my time trying to private communications software usable by everyone, designing interactive software with a focus on human perceptual capabilities and building networks with free software. I ve been involved in Debian since approximately 2008 and became an official Debian developer in 2011. In the little time left over from that, I sometimes compose music with computers from my home in Austria.
Chris: For those who have not heard of it before, what is the F-Droid project? Hans: F-Droid is a community-run app store that provides free software applications for Android phones. First and foremost our goal is to represent our users. In particular, we review all of the apps that we distribute, and these reviews not only check for whether something is definitely free software or not, we additionally look for ethical problems with applications as well issues that we call Anti-Features . Since the project began in 2010, F-Droid now offers almost 4,000 free-software applications. F-Droid is also a app store kit as well, providing all the tools that are needed to operate an free app store of your own. It also includes complete build and release tools for managing the process of turning app source code into published builds.
Chris: How exactly does F-Droid differ from the Google Play store? Why might someone use F-Droid over Google Play? Hans: One key difference to the Google Play Store is that F-Droid does not ship proprietary software by default. All apps shipped from f-droid.org are built from source on our own builders. This is partly because F-Droid is backed by the free software community; that is, people who have engaged in the free software community long before Android was conceived, and, in particular, share many if not all of its values. Using F-Droid will therefore feel very familiar to anyone familiar with a modern Linux distribution.
Chris: How do you describe reproducibility from the F-Droid point of view? Hans: All centralised software repositories are extremely tempting targets for exploitation by malicious third parties, and the kinds of personal or otherwise sensitive data on our phones only increase this risk. In F-Droid s case, not only could the software we distribute be theoretically replaced with nefarious copies, the build infrastructure that generates that software could be compromised as well. F-Droid having reproducible builds is extremely important as it allows us to verify that our build systems have not been compromised and distributing malware to our users. In particular, if an independent build infrastructure can produce precisely the same results from a build, then we can be reasonably sure that they haven t been compromised. Technically-minded users can also validate their builds on their own systems too, further increasing trust in our build infrastructure. (This can be performed using fdroid verify command.) Our signature & trust scheme means that F-Droid can verify that an app is 100% free software whilst still using the developer s original .APK file. More details about this may be found in our reproducibility documentation and on the page about our Verification Server.
Chris: How do you see F-Droid fitting into the rest of the modern security ecosystem? Hans: Whilst F-Droid inherits all of the social benefits of free software, F-Droid takes special care to respect your privacy as well we don t attempt to track your device in any way. In particular, you don t need an account to use the F-Droid client, and F-Droid doesn t send any device-identifying information to our servers other than its own version number. What is more, we mark all apps in our repository that track you, and users can choose to hide any apps that has been tagged with a specific Anti-Feature in the F-Droid settings. Furthermore, any personal data you decide to give us (such as your email address when registering for a forum account) goes no further than us as well, and we pledge that it will never be used for anything beyond merely maintaining your account.
Chris: What would fully reproducible mean to F-Droid? What it would look like if reproducibility was a solved problem ? Or, rather, what would be your ultimate reproducibility goals? Hans: In an ideal world, every application submitted to F-Droid would not only build reproducibly, but would come with a cryptographically-signed signature from the developer as well. Then we would only distribute an compiled application after a build had received a number of matching signatures from multiple, independent third parties. This would mean that our users were not placing their trust solely in software developers machines, and wouldn t be trusting our centralised build servers as well.
Chris: What are the biggest blockers to reaching this state? Are there any key steps or milestones to get there? Hans: Time is probably the main constraint to reaching this goal. Not only do we need system administrators on an ongoing basis but we also need to incorporate reproducibly checks into our Continuous Integration (CI) system. We are always looking for new developers to join our effort, as well as learning about how to better bring them up to speed. Separate to this, we often experience blockers with reproducibility-related bugs in the Android build tooling. Luckily, upstreams do ultimately address these issues, but in some cases this has taken three or four years to reach end-users and developers. Unfortunately, upstream is not chiefly concerned with the security aspects of reproducibility builds; they care more about how it can minimise and optimise download size and speed.
Chris: Are you tracking any statistics about reproducibility in F-Droid over time? If so, can you share them? Does F-Droid track statistics about its own usage? Hans: To underline a topic touched on above, F-Droid is dedicated to preserving the privacy of its users; we therefore do not record usage statistics. This is, of course, in contrast to other application stores. However, we are in a position to track whether packages in F-Droid are reproducible or not. As in: what proportion of APKs in F-Droid have been independently verified over time? Unfortunately, due to time constraints, we do not yet automatically publish charts for this. We do publish some raw data that is related, though, and we naturally welcome contributions of visualizations based on any and all of our data. The All our APIs page on our wiki is a good place to start for someone wanting to contribute, everything about reproducible F-Droid apps is available in JSON format, what s missing are apps or dashboards making use of the available raw data.
Chris: Many thanks for this interview, Hans, and for all of your work on F-Droid and elsewhere. If someone wanted to get in touch or learn more about the project, where might someone go? Hans: The best way to find out about F-Droid is, of course, the main F-Droid homepage, but we are also on Twitter @fdroidorg. We have many avenues to participate and to learn more! We have an About page on our website and a thriving forum. We also have part of our documentation specifically dedicated to reproducible builds.


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

6 June 2022

Reproducible Builds: Reproducible Builds in May 2022

Welcome to the May 2022 report from the Reproducible Builds project. In our reports we outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Repfix paper Zhilei Ren, Shiwei Sun, Jifeng Xuan, Xiaochen Li, Zhide Zhou and He Jiang have published an academic paper titled Automated Patching for Unreproducible Builds:
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.
The paper (PDF, 3.5MB) uses the Debian mylvmbackup package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.

Python variables Johannes Schauer discovered a fascinating bug where simply naming your Python variable _m led to unreproducible .pyc files. In particular, the types module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
 AsyncGeneratorType = type(_ag)
 
 class _C:
-    def _m(self): pass
-MethodType = type(_C()._m)
+    def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m to _b was enough to workaround the problem. Johannes bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.

New SPDX team to incorporate build metadata in Software Bill of Materials SPDX, the open standard for Software Bill of Materials (SBOM), is continuously developed by a number of teams and committees. However, SPDX has welcomed a new addition; a team dedicated to enhancing metadata about software builds, complementing reproducible builds in creating a more secure software supply chain. The SPDX Builds Team has been working throughout May to define the universal primitives shared by all build systems, including the who, what, where and how of builds:
  • Who: the identity of the person or organisation that controls the build infrastructure.
  • What: the inputs and outputs of a given build, combining metadata about the build s configuration with an SBOM describing source code and dependencies.
  • Where: the software packages making up the build system, from build orchestration tools such as Woodpecker CI and Tekton to language-specific tools.
  • How: the invocation of a build, linking metadata of a build to the identity of the person or automation tool that initiated it.
The SPDX Builds Team expects to have a usable data model by September, ready for inclusion in the SPDX 3.0 standard. The team welcomes new contributors, inviting those interested in joining to introduce themselves on the SPDX-Tech mailing list.

Talks at Debian Reunion Hamburg Some of the Reproducible Builds team (Holger Levsen, Mattia Rizzolo, Roland Clobus, Philip Rinn, etc.) met in real life at the Debian Reunion Hamburg (official homepage). There were several informal discussions amongst them, as well as two talks related to reproducible builds. First, Holger Levsen gave a talk on the status of Reproducible Builds for bullseye and bookworm and beyond (WebM, 210MB): Secondly, Roland Clobus gave a talk called Reproducible builds as applied to non-compiler output (WebM, 115MB):

Supply-chain security attacks This was another bumper month for supply-chain attacks in package repositories. Early in the month, Lance R. Vick noticed that the maintainer of the NPM foreach package let their personal email domain expire, so they bought it and now controls foreach on NPM and the 36,826 projects that depend on it . Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the correct way to ship packages is with your distribution s package manager .

Bootstrapping Bootstrapping is a process for building software tools progressively from a primitive compiler tool and source language up to a full Linux development environment with GCC, etc. This is important given the amount of trust we put in existing compiler binaries. This month, a bootstrappable mini-kernel was announced. Called boot2now, it comprises a series of compilers in the form of bootable machine images.

Google s new Assured Open Source Software service Google Cloud (the division responsible for the Google Compute Engine) announced a new Assured Open Source Software service. Noting the considerable 650% year-over-year increase in cyberattacks aimed at open source suppliers, the new service claims to enable enterprise and public sector users of open source software to easily incorporate the same OSS packages that Google uses into their own developer workflows . The announcement goes on to enumerate that packages curated by the new service would be:
  • Regularly scanned, analyzed, and fuzz-tested for vulnerabilities.
  • Have corresponding enriched metadata incorporating Container/Artifact Analysis data.
  • Are built with Cloud Build including evidence of verifiable SLSA-compliance
  • Are verifiably signed by Google.
  • Are distributed from an Artifact Registry secured and protected by Google.
(Full announcement)

A retrospective on the Rust programming language Andrew bunnie Huang published a long blog post this month promising a critical retrospective on the Rust programming language. Amongst many acute observations about the evolution of the language s syntax (etc.), the post beings to critique the languages approach to supply chain security ( Rust Has A Limited View of Supply Chain Security ) and reproducibility ( You Can t Reproduce Someone Else s Rust Build ):
There s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say almost anyone else because this fix would be OS-dependent, so we d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.
(Full post)

Reproducible Builds IRC meeting The minutes and logs from our May 2022 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 28th June at 15:00 UTC on #reproducible-builds on the OFTC network.

A new tool to improve supply-chain security in Arch Linux kpcyrd published yet another interesting tool related to reproducibility. Writing about the tool in a recent blog post, kpcyrd mentions that although many PKGBUILDs provide authentication in the context of signed Git tags (i.e. the ability to verify the Git tag was signed by one of the two trusted keys ), they do not support pinning, ie. that upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing . Conversely, other PKGBUILDs support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd s original blog post.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 212, 213 and 214 to Debian unstable. Chris also made the following changes:
  • New features:
    • Add support for extracting vmlinuz Linux kernel images. [ ]
    • Support both python-argcomplete 1.x and 2.x. [ ]
    • Strip sticky etc. from x.deb: sticky Debian binary package [ ]. [ ]
    • Integrate test coverage with GitLab s concept of artifacts. [ ][ ][ ]
  • Bug fixes:
    • Don t mask differences in .zip or .jar central directory extra fields. [ ]
    • Don t show a binary comparison of .zip or .jar files if we have observed at least one nested difference. [ ]
  • Codebase improvements:
    • Substantially update comment for our calls to zipinfo and zipinfo -v. [ ]
    • Use assert_diff in test_zip over calling get_data with a separate assert. [ ]
    • Don t call re.compile and then call .sub on the result; just call re.sub directly. [ ]
    • Clarify the comment around the difference between --usage and --help. [ ]
  • Testsuite improvements:
    • Test --help and --usage. [ ]
    • Test that --help includes the file formats. [ ]
Vagrant Cascadian added an external tool reference xb-tool for GNU Guix [ ] as well as updated the diffoscope package in GNU Guix itself [ ][ ][ ].

Distribution work In Debian, 41 reviews of Debian packages were added, 85 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated, including adding a new nondeterministic_ordering_in_deprecated_items_collected_by_doxygen toolchain issue [ ] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering [ ], extended_attributes_in_jar_file_created_without_manifest [ ] and apxs_captures_build_path [ ]. Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducible builds website Chris Lamb updated the main Reproducible Builds website and documentation in a number of small ways, but also prepared and published an interview with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix. [ ][ ][ ][ ] In addition, Tim Jones added a link to the Talos Linux project [ ] and billchenchina fixed a dead link [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Add support for detecting running kernels that require attention. [ ]
    • Temporarily configure a host to support performing Debian builds for packages that lack .buildinfo files. [ ]
    • Update generated webpages to clarify wishes for feedback. [ ]
    • Update copyright years on various scripts. [ ]
  • Mattia Rizzolo:
    • Provide a facility so that Debian Live image generation can copy a file remotely. [ ][ ][ ][ ]
  • Roland Clobus:
    • Add initial support for testing generated images with OpenQA. [ ]
And finally, as usual, node maintenance was also performed by Holger Levsen [ ][ ].

Misc news On our mailing list this month:

Contact If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

18 May 2022

Reproducible Builds: Supporter spotlight: Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the fourth instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC and the Google Open Source Security Team (GOSST). Today, however, we will be talking with Jan Nieuwenhuizen about Bootstrappable Builds, GNU Mes and GNU Guix.
Chris Lamb: Hi Jan, thanks for taking the time to talk with us today. First, could you briefly tell me about yourself? Jan: Thanks for the chat; it s been a while! Well, I ve always been trying to find something new and interesting that is just asking to be created but is mostly being overlooked. That s how I came to work on GNU Guix and create GNU Mes to address the bootstrapping problem that we have in free software. It s also why I have been working on releasing Dezyne, a programming language and set of tools to specify and formally verify concurrent software systems as free software. Briefly summarised, compilers are often written in the language they are compiling. This creates a chicken-and-egg problem which leads users and distributors to rely on opaque, pre-built binaries of those compilers that they use to build newer versions of the compiler. To gain trust in our computing platforms, we need to be able to tell how each part was produced from source, and opaque binaries are a threat to user security and user freedom since they are not auditable. The goal of bootstrappability (and the bootstrappable.org project in particular) is to minimise the amount of these bootstrap binaries. Anyway, after studying Physics at Eindhoven University of Technology (TU/e), I worked for digicash.com, a startup trying to create a digital and anonymous payment system sadly, however, a traditional account-based system won. Separate to this, as there was no software (either free or proprietary) to automatically create beautiful music notation, together with Han-Wen Nienhuys, I created GNU LilyPond. Ten years ago, I took the initiative to co-found a democratic school in Eindhoven based on the principles of sociocracy. And last Christmas I finally went vegan, after being mostly vegetarian for about 20 years!
Chris: For those who have not heard of it before, what is GNU Guix? What are the key differences between Guix and other Linux distributions? Jan: GNU Guix is both a package manager and a full-fledged GNU/Linux distribution. In both forms, it provides state-of-the-art package management features such as transactional upgrades and package roll-backs, hermetical-sealed build environments, unprivileged package management as well as per-user profiles. One obvious difference is that Guix forgoes the usual Filesystem Hierarchy Standard (ie. /usr, /lib, etc.), but there are other significant differences too, such as Guix being scriptable using Guile/Scheme, as well as Guix s dedication and focus on free software.
Chris: How does GNU Guix relate to GNU Mes? Or, rather, what problem is Mes attempting to solve? Jan: GNU Mes was created to address the security concerns that arise from bootstrapping an operating system such as Guix. Even if this process entirely involves free software (i.e. the source code is, at least, available), this commonly uses large and unauditable binary blobs. Mes is a Scheme interpreter written in a simple subset of C and a C compiler written in Scheme, and it comes with a small, bootstrappable C library. Twice, the Mes bootstrap has halved the size of opaque binaries that were needed to bootstrap GNU Guix. These reductions were achieved by first replacing GNU Binutils, GNU GCC and the GNU C Library with Mes, and then replacing Unix utilities such as awk, bash, coreutils, grep sed, etc., by Gash and Gash-Utils. The final goal of Mes is to help create a full-source bootstrap for any interested UNIX-like operating system.
Chris: What is the current status of Mes? Jan: Mes supports all that is needed from R5RS and GNU Guile to run MesCC with Nyacc, the C parser written for Guile, for 32-bit x86 and ARM. The next step for Mes would be more compatible with Guile, e.g., have guile-module support and support running Gash and Gash Utils. In working to create a full-source bootstrap, I have disregarded the kernel and Guix build system for now, but otherwise, all packages should be built from source, and obviously, no binary blobs should go in. We still need a Guile binary to execute some scripts, and it will take at least another one to two years to remove that binary. I m using the 80/20 approach, cutting corners initially to get something working and useful early. Another metric would be how many architectures we have. We are quite a way with ARM, tinycc now works, but there are still problems with GCC and Glibc. RISC-V is coming, too, which could be another metric. Someone has looked into picking up NixOS this summer. How many distros do anything about reproducibility or bootstrappability? The bootstrappability community is so small that we don t need metrics, sadly. The number of bytes of binary seed is a nice metric, but running the whole thing on a full-fledged Linux system is tough to put into a metric. Also, it is worth noting that I m developing on a modern Intel machine (ie. a platform with a management engine), that s another key component that doesn t have metrics.
Chris: From your perspective as a Mes/Guix user and developer, what does reproducibility mean to you? Are there any related projects? Jan: From my perspective, I m more into the problem of bootstrapping, and reproducibility is a prerequisite for bootstrappability. Reproducibility clearly takes a lot of effort to achieve, however. It s relatively easy to install some Linux distribution and be happy, but if you look at communities that really care about security, they are investing in reproducibility and other ways of improving the security of their supply chain. Projects I believe are complementary to Guix and Mes include NixOS, Debian and on the hardware side the RISC-V platform shares many of our core principles and goals.
Chris: Well, what are these principles and goals? Jan: Reproducibility and bootstrappability often feel like the next step in the frontier of free software. If you have all the sources and you can t reproduce a binary, that just doesn t feel right anymore. We should start to desire (and demand) transparent, elegant and auditable software stacks. To a certain extent, that s always been a low-level intent since the beginning of free software, but something clearly got lost along the way. On the other hand, if you look at the NPM or Rust ecosystems, we see a world where people directly install binaries. As they are not as supportive of copyleft as the rest of the free software community, you can see that movement and people in our area are doing more as a response to that so that what we have continues to remain free, and to prevent us from falling asleep and waking up in a couple of years and see, for example, Rust in the Linux kernel and (more importantly) we require big binary blobs to use our systems. It s an excellent time to advance right now, so we should get a foothold in and ensure we don t lose any more.
Chris: What would be your ultimate reproducibility goal? And what would the key steps or milestones be to reach that? Jan: The ultimate goal would be to have a system built with open hardware, with all software on it fully bootstrapped from its source. This bootstrap path should be easy to replicate and straightforward to inspect and audit. All fully reproducible, of course! In essence, we would have solved the supply chain security problem. Our biggest challenge is ignorance. There is much unawareness about the importance of what we are doing. As it is rather technical and doesn t really affect everyday computer use, that is not surprising. This unawareness can be a great force driving us in the opposite direction. Think of Rust being allowed in the Linux kernel, or Python being required to build a recent GNU C library (glibc). Also, the fact that companies like Google/Apple still want to play us vs them , not willing to to support GPL software. Not ready yet to truly support user freedom. Take the infamous log4j bug everyone is using open source these days, but nobody wants to take responsibility and help develop or nurture the community. Not ecosystem , as that s how it s being approached right now: live and let live/die: see what happens without taking any responsibility. We are growing and we are strong and we can do a lot but if we have to work against those powers, it can become problematic. So, let s spread our great message and get more people involved!
Chris: What has been your biggest win? Jan: From a technical point of view, the full-source bootstrap has have been our biggest win. A talk by Carl Dong at the 2019 Breaking Bitcoin conference stated that connecting Jeremiah Orian s Stage0 project to Mes would be the holy grail of bootstrapping, and we recently managed to achieve just that: in other words, starting from hex0, 357-byte binary, we can now build the entire Guix system. This past year we have not made significant visible progress, however, as our funding was unfortunately not there. The Stage0 project has advanced in RISC-V. A month ago, though, I secured NLnet funding for another year, and thanks to NLnet, Ekaitz Zarraga and Timothy Sample will work on GNU Mes and the Guix bootstrap as well. Separate to this, the bootstrappable community has grown a lot from two people it was six years ago: there are now currently over 100 people in the #bootstrappable IRC channel, for example. The enlarged community is possibly an even more important win going forward.
Chris: How realistic is a 100% bootstrappable toolchain? And from someone who has been working in this area for a while, is solving Trusting Trust) actually feasible in reality? Jan: Two answers: Yes and no, it really depends on your definition. One great thing is that the whole Stage0 project can also run on the Knight virtual machine, a hardware platform that was designed, I think, in the 1970s. I believe we can and must do better than we are doing today, and that there s a lot of value in it. The core issue is not the trust; we can probably all trust each other. On the other hand, we don t want to trust each other or even ourselves. I am not, personally, going to inspect my RISC-V laptop, and other people create the hardware and do probably not want to inspect the software. The answer comes back to being conscientious and doing what is right. Inserting GCC as a binary blob is not right. I think we can do better, and that s what I d like to do. The security angle is interesting, but I don t like the paranoid part of that; I like the beauty of what we are creating together and stepwise improving on that.
Chris: Thanks for taking the time to talk to us today. If someone wanted to get in touch or learn more about GNU Guix or Mes, where might someone go? Jan: Sure! First, check out: I m also on Twitter (@janneke_gnu) and on octodon.social (@janneke@octodon.social).
Chris: Thanks for taking the time to talk to us today. Jan: No problem. :)


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

1 April 2022

Russell Coker: Converting to UEFI

When I got my HP ML110 Gen9 working as a workstation I initially was under the impression that boot wasn t supported on NVMe and booted it from USB. I found USB booting with legacy boot to be unreliable so decided to try EFI booting and noticed that the NVMe devices were boot candidates with UEFI. Making one of them bootable was more complex than expected because no-one seems to have documented such things. So here s my documentation, it s not great but this method has worked once for me. Before starting major partitioning work it s best to run parted -l and save the output to a file, that can allow you to recreate partitions if you corrupt them. One thing I m doing on systems I manage is putting @reboot /usr/sbin/parted -l > /root/parted.log in the root crontab, then when the system is backed up the backup server gets any recent changes to partitioning (I don t backup /var/log on all my systems). Firstly run parted on the device to create the EFI and /boot partitions, note that if you want to copy and paste from this you must do so one line at a time, a block paste seemed to confuse parted.
mklabel gpt
mkpart EFI fat32 1 99
mkpart boot ext3 99 300
toggle 1 boot
toggle 1 esp
p
# Model: CT1000P1SSD8 (nvme)
# Disk /dev/nvme1n1: 1000GB
# Sector size (logical/physical): 512B/512B
# Partition Table: gpt
# Disk Flags: 
#
# Number  Start   End     Size    File system  Name  Flags
#  1      1049kB  98.6MB  97.5MB  fat32        EFI   boot, esp
#  2      98.6MB  300MB   201MB   ext3         boot
q
Here are the commands needed to create the filesystems and install the necessary files. This is almost to the stage of being scriptable. Some minor changes need to be made to convert from NVMe device names to SATA/SAS but nothing serious.
mkfs.vfat /dev/nvme1n1p1
mkfs.ext3 -N 1000 /dev/nvme1n1p2
file -s /dev/nvme1n1p2   sed -e s/^.*UUID/UUID/ -e "s/ .*$/ \/boot ext3 noatime 0 1/" >> /etc/fstab
file -s /dev/nvme1n1p1   tr "[a-f]" "[A-F]"  sed -e s/^.*numBEr.0x/UUID=/ -e "s/, .*$/ \/boot\/efi vfat umask=0077 0 1/" >> /etc/fstab
# edit /etc/fstab to put a hyphen between the 2 groups of 4 chars for the VFAT filesystem UUID
mount /boot
mkdir -p /boot/efi /boot/grub
mount /boot/efi
mkdir -p /boot/efi/EFI/debian
apt install efibootmgr shim-unsigned grub-efi-amd64
cp /usr/lib/shim/* /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /boot/efi/EFI/debian
file -s /dev/nvme1n1p2   sed -e "s/^.*UUID=/search.fs_uuid /" -e "s/ .needs.*$/ root hd0,gpt2/" > /boot/efi/EFI/debian/grub.cfg
echo "set prefix=(\$root)'/boot/grub'" >> /boot/efi/EFI/debian/grub.cfg
echo "configfile \$prefix/grub.cfg" >> /boot/efi/EFI/debian/grub.cfg
grub-install
update-grub
If someone would like to make a script that can handle the different partition names of regular SCSI/SATA disks, NVMe, CCISS, etc then that would be great. It would be good to have a script in Debian that creates the partitions and sets up the EFI files. If you want to have a second bootable device then the following commands will copy a GPT partition table and give it new UUIDs, make very certain that $DISKB is the one you want to be wiped and refer to my previous mention of parted -l . Also note that parted has a rescue command which works very well.
sgdisk /dev/$DISKA -R /dev/$DISKB 
sgdisk -G /dev/$DISKB
To backup a GPT partition table run a command like this. Note that if sgdisk is told to backup a MBR partitioned disk it will say Found invalid GPT and valid MBR; converting MBR to GPT forma which is probably a viable way of converting MBR format to GPT.
sgdisk -b sda.bak /dev/sda

14 April 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.4.0.0 on CRAN: New Upstream Plus

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 852 other packages on CRAN. This new release brings us the just release Armadillo 10.4.0. Upstream moves at a speed that is a little faster than the cadence CRAN likes. We release RcppArmadillo 0.10.2.2.0 on March 9; and upstream 10.3.0 came out shortly thereafter. We aim to accomodate CRAN with (roughly) monthly (or less frequent) releases) so by the time we were ready 10.4.0 had just come out. As it turns, the full testing had a benefit. Among the (currently) 852 CRAN packages using RcppArmadillo, two were failing tests. This is due to a subtle, but important point. Early on we realized that it would be beneficial if the standard R control over random-number creation and seeding affected Armadillo too, which Conrad accomodated kindly with an optional RNG interface which RcppArmadillo supplies. With recent changes he made, the R side saw normally-distributed draws (via the Armadillo interface) changed, which lead to the two changes. All hail unit tests. So I mentioned this to Conrad, and with the usual Chicago-Brisbane time difference late my evening a fix was in my inbox. The CRAN upload was then halted as I had missed that due to other changes he had made random draws from a Gamma would now call std::rand() which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package s maintainer who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0. The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.

Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
  • Upgraded to Armadillo release 10.4.0 (Pressure Cooker)
    • faster handling of triangular matrices by log_det()
    • added log_det_sympd() for log determinant of symmetric positive matrices
    • added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages
    • reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings
  • Apply one upstream corrections for arma::randn draws when using alternative (here R) generator, and arma::randg.

Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
  • Upgraded to Armadillo release 10.3 (Sunrise Chaos)
    • faster handling of symmetric positive definite matrices by pinv()
    • expanded .save() / .load() for dense matrices to handle coord_ascii format
    • for out of bounds access, element accessors now throw the more nuanced std::out_of_range exception, instead of only std::logic_error
    • improved quality of random numbers

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 July 2020

Dirk Eddelbuettel: Rcpp 1.0.5: Several Updates

rcpp logo Right on the heels of the news of 2000 CRAN packages using Rcpp (and also hitting 12.5 of CRAN package, or one in eight), we are happy to announce release 1.0.5 of Rcpp. Since the ten-year anniversary and the 1.0.0 release release in November 2018, we have been sticking to a four-month release cycle. The last release has, however, left us with a particularly bad taste due to some rather peculiar interactions with a very small (but ever so vocal) portion of the user base. So going forward, we will change two things. First off, we reiterate that we have already made rolling releases. Each minor snapshot of the main git branch gets a point releases. Between release 1.0.4 and this 1.0.5 release, there were in fact twelve of those. Each and every one of these was made available via the drat repo, and we will continue to do so going forward. Releases to CRAN, however, are real work. If they then end up with as much nonsense as the last release 1.0.4, we think it is appropriate to slow things down some more so we intend to now switch to a six-months cycle. As mentioned, interim releases are always just one install.packages() call with a properly set repos argument away. Rcpp has become the most popular way of enhancing R with C or C++ code. As of today, 2002 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 203 in BioConductor. And per the (partial) logs of CRAN downloads, we are running steady at around one millions downloads per month. This release features again a number of different pull requests by different contributors covering the full range of API improvements, attributes enhancements, changes to Sugar and helper functions, extended documentation as well as continuous integration deplayment. See the list below for details.

Changes in Rcpp patch release version 1.0.5 (2020-07-01)
  • Changes in Rcpp API:
    • The exception handler code in #1043 was updated to ensure proper include behavior (Kevin in #1047 fixing #1046).
    • A missing Rcpp_list6 definition was added to support R 3.3.* builds (Davis Vaughan in #1049 fixing #1048).
    • Missing Rcpp_list 2,3,4,5 definition were added to the Rcpp namespace (Dirk in #1054 fixing #1053).
    • A further updated corrected the header include and provided a missing else branch (Mattias Ellert in #1055).
    • Two more assignments are protected with Rcpp::Shield (Dirk in #1059).
    • One call to abs is now properly namespaced with std:: (Uwe Korn in #1069).
    • String object memory preservation was corrected/simplified (Kevin in #1082).
  • Changes in Rcpp Attributes:
    • Empty strings are not passed to R CMD SHLIB which was seen with R 4.0.0 on Windows (Kevin in #1062 fixing #1061).
    • The short_file_name() helper function is safer with respect to temporaries (Kevin in #1067 fixing #1066, and #1071 fixing #1070).
  • Changes in Rcpp Sugar:
    • Two sample() objects are now standard vectors and not R_alloc created (Dirk in #1075 fixing #1074).
  • Changes in Rcpp support functions:
    • Rcpp.package.skeleton() adjusts for a (documented) change in R 4.0.0 (Dirk in #1088 fixing #1087).
  • Changes in Rcpp Documentation:
    • The pdf file of the earlier introduction is again typeset with bibliographic information (Dirk).
    • A new vignette describing how to package C++ libraries has been added (Dirk in #1078 fixing #1077).
  • Changes in Rcpp Deployment:
    • Travis CI unit tests now run a matrix over the versions of R also tested at CRAN (rel/dev/oldrel/oldoldrel), and coverage runs in parallel for a net speed-up (Dirk in #1056 and #1057).
    • The exceptions test is now partially skipped on Solaris as it already is on Windows (Dirk in #1065).
    • The default CI runner was upgraded to R 4.0.0 (Dirk).
    • The CI matrix spans R 3.5, 3.6, r-release and r-devel (Dirk).

Thanks to CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2455 previous questions. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.